Search | VHL Regional Portal

Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan.

Kiraly, Atilla P; Cunningham, Corbin A; Najafi, Ryan; Nabulsi, Zaid; Yang, Jie; Lau, Charles; Ledsam, Joseph R; Ye, Wenxing; Ardila, Diego; McKinney, Scott M; Pilgrim, Rory; Liu, Yun; Saito, Hiroaki; Shimamura, Yasuteru; Etemadi, Mozziyar; Melnick, David; Jansen, Sunny; Corrado, Greg S; Peng, Lily; Tse, Daniel; Shetty, Shravya; Prabhakara, Shruthi; Nadich, David P; Beladia, Neeral; Eswaran, Krish.

Radiol Artif Intell ; 6(3): e230079, 2024 May.

Article in English | MEDLINE | ID: mdl-38477661

ABSTRACT

Purpose To evaluate the impact of an artificial intelligence (AI) assistant for lung cancer screening on multinational clinical workflows. Materials and Methods An AI assistant for lung cancer screening was evaluated on two retrospective randomized multireader multicase studies where 627 (141 cancer-positive cases) low-dose chest CT cases were each read twice (with and without AI assistance) by experienced thoracic radiologists (six U.S.-based or six Japan-based radiologists), resulting in a total of 7524 interpretations. Positive cases were defined as those within 2 years before a pathology-confirmed lung cancer diagnosis. Negative cases were defined as those without any subsequent cancer diagnosis for at least 2 years and were enriched for a spectrum of diverse nodules. The studies measured the readers' level of suspicion (on a 0-100 scale), country-specific screening system scoring categories, and management recommendations. Evaluation metrics included the area under the receiver operating characteristic curve (AUC) for level of suspicion and sensitivity and specificity of recall recommendations. Results With AI assistance, the radiologists' AUC increased by 0.023 (0.70 to 0.72; P = .02) for the U.S. study and by 0.023 (0.93 to 0.96; P = .18) for the Japan study. Scoring system specificity for actionable findings increased 5.5% (57% to 63%; P < .001) for the U.S. study and 6.7% (23% to 30%; P < .001) for the Japan study. There was no evidence of a difference in corresponding sensitivity between unassisted and AI-assisted reads for the U.S. (67.3% to 67.5%; P = .88) and Japan (98% to 100%; P > .99) studies. Corresponding stand-alone AI AUC system performance was 0.75 (95% CI: 0.70, 0.81) and 0.88 (95% CI: 0.78, 0.97) for the U.S.- and Japan-based datasets, respectively. Conclusion The concurrent AI interface improved lung cancer screening specificity in both U.S.- and Japan-based reader studies, meriting further study in additional international screening environments. Keywords: Assistive Artificial Intelligence, Lung Cancer Screening, CT Supplemental material is available for this article. Published under a CC BY 4.0 license.

Subject(s)

Artificial Intelligence , Early Detection of Cancer , Lung Neoplasms , Tomography, X-Ray Computed , Humans , Lung Neoplasms/diagnosis , Lung Neoplasms/epidemiology , Japan , United States/epidemiology , Retrospective Studies , Early Detection of Cancer/methods , Female , Male , Middle Aged , Aged , Sensitivity and Specificity , Radiographic Image Interpretation, Computer-Assisted/methods

An intentional approach to managing bias in general purpose embedding models.

Weng, Wei-Hung; Sellergen, Andrew; Kiraly, Atilla P; D'Amour, Alexander; Park, Jungyeon; Pilgrim, Rory; Pfohl, Stephen; Lau, Charles; Natarajan, Vivek; Azizi, Shekoofeh; Karthikesalingam, Alan; Cole-Lewis, Heather; Matias, Yossi; Corrado, Greg S; Webster, Dale R; Shetty, Shravya; Prabhakara, Shruthi; Eswaran, Krish; Celi, Leo A G; Liu, Yun.

Lancet Digit Health ; 6(2): e126-e130, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38278614

ABSTRACT

Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components-GPPEs-from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended.

Subject(s)

Delivery of Health Care , Machine Learning , Humans , Bias , Algorithms

Deep Learning Detection of Active Pulmonary Tuberculosis at Chest Radiography Matched the Clinical Performance of Radiologists.

Kazemzadeh, Sahar; Yu, Jin; Jamshy, Shahar; Pilgrim, Rory; Nabulsi, Zaid; Chen, Christina; Beladia, Neeral; Lau, Charles; McKinney, Scott Mayer; Hughes, Thad; Kiraly, Atilla P; Kalidindi, Sreenivasa Raju; Muyoyeta, Monde; Malemela, Jameson; Shih, Ting; Corrado, Greg S; Peng, Lily; Chou, Katherine; Chen, Po-Hsuan Cameron; Liu, Yun; Eswaran, Krish; Tse, Daniel; Shetty, Shravya; Prabhakara, Shruthi.

Radiology ; 306(1): 124-137, 2023 01.

Article in English | MEDLINE | ID: mdl-36066366

ABSTRACT

Background The World Health Organization (WHO) recommends chest radiography to facilitate tuberculosis (TB) screening. However, chest radiograph interpretation expertise remains limited in many regions. Purpose To develop a deep learning system (DLS) to detect active pulmonary TB on chest radiographs and compare its performance to that of radiologists. Materials and Methods A DLS was trained and tested using retrospective chest radiographs (acquired between 1996 and 2020) from 10 countries. To improve generalization, large-scale chest radiograph pretraining, attention pooling, and semisupervised learning ("noisy-student") were incorporated. The DLS was evaluated in a four-country test set (China, India, the United States, and Zambia) and in a mining population in South Africa, with positive TB confirmed with microbiological tests or nucleic acid amplification testing (NAAT). The performance of the DLS was compared with that of 14 radiologists. The authors studied the efficacy of the DLS compared with that of nine radiologists using the Obuchowski-Rockette-Hillis procedure. Given WHO targets of 90% sensitivity and 70% specificity, the operating point of the DLS (0.45) was prespecified to favor sensitivity. Results A total of 165 754 images in 22 284 subjects (mean age, 45 years; 21% female) were used for model development and testing. In the four-country test set (1236 subjects, 17% with active TB), the receiver operating characteristic (ROC) curve of the DLS was higher than those for all nine India-based radiologists, with an area under the ROC curve of 0.89 (95% CI: 0.87, 0.91). Compared with these radiologists, at the prespecified operating point, the DLS sensitivity was higher (88% vs 75%, P < .001) and specificity was noninferior (79% vs 84%, P = .004). Trends were similar within other patient subgroups, in the South Africa data set, and across various TB-specific chest radiograph findings. In simulations, the use of the DLS to identify likely TB-positive chest radiographs for NAAT confirmation reduced the cost by 40%-80% per TB-positive patient detected. Conclusion A deep learning method was found to be noninferior to radiologists for the determination of active tuberculosis on digital chest radiographs. © RSNA, 2022 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue.

Subject(s)

Deep Learning , Tuberculosis, Pulmonary , Humans , Female , Middle Aged , Male , Radiography, Thoracic/methods , Retrospective Studies , Radiography , Tuberculosis, Pulmonary/diagnostic imaging , Radiologists , Sensitivity and Specificity

Simplified Transfer Learning for Chest Radiography Models Using Less Data.

Sellergren, Andrew B; Chen, Christina; Nabulsi, Zaid; Li, Yuanzhen; Maschinot, Aaron; Sarna, Aaron; Huang, Jenny; Lau, Charles; Kalidindi, Sreenivasa Raju; Etemadi, Mozziyar; Garcia-Vicente, Florencia; Melnick, David; Liu, Yun; Eswaran, Krish; Tse, Daniel; Beladia, Neeral; Krishnan, Dilip; Shetty, Shravya.

Radiology ; 305(2): 454-465, 2022 11.

Article in English | MEDLINE | ID: mdl-35852426

ABSTRACT

Background Developing deep learning models for radiology requires large data sets and substantial computational resources. Data set size limitations can be further exacerbated by distribution shifts, such as rapid changes in patient populations and standard of care during the COVID-19 pandemic. A common partial mitigation is transfer learning by pretraining a "generic network" on a large nonmedical data set and then fine-tuning on a task-specific radiology data set. Purpose To reduce data set size requirements for chest radiography deep learning models by using an advanced machine learning approach (supervised contrastive [SupCon] learning) to generate chest radiography networks. Materials and Methods SupCon helped generate chest radiography networks from 821 544 chest radiographs from India and the United States. The chest radiography networks were used as a starting point for further machine learning model development for 10 prediction tasks (eg, airspace opacity, fracture, tuberculosis, and COVID-19 outcomes) by using five data sets comprising 684 955 chest radiographs from India, the United States, and China. Three model development setups were tested (linear classifier, nonlinear classifier, and fine-tuning the full network) with different data set sizes from eight to 85. Results Across a majority of tasks, compared with transfer learning from a nonmedical data set, SupCon reduced label requirements up to 688-fold and improved the area under the receiver operating characteristic curve (AUC) at matching data set sizes. At the extreme low-data regimen, training small nonlinear models by using only 45 chest radiographs yielded an AUC of 0.95 (noninferior to radiologist performance) in classifying microbiology-confirmed tuberculosis in external validation. At a more moderate data regimen, training small nonlinear models by using only 528 chest radiographs yielded an AUC of 0.75 in predicting severe COVID-19 outcomes. Conclusion Supervised contrastive learning enabled performance comparable to state-of-the-art deep learning models in multiple clinical tasks by using as few as 45 images and is a promising method for predictive modeling with use of small data sets and for predicting outcomes in shifting patient populations. © RSNA, 2022 Online supplemental material is available for this article.

Subject(s)

COVID-19 , Deep Learning , Humans , Radiography, Thoracic/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Pandemics , COVID-19/diagnostic imaging , Retrospective Studies , Radiography , Machine Learning

Deep learning for distinguishing normal versus abnormal chest radiographs and generalization to two unseen diseases tuberculosis and COVID-19.

Nabulsi, Zaid; Sellergren, Andrew; Jamshy, Shahar; Lau, Charles; Santos, Edward; Kiraly, Atilla P; Ye, Wenxing; Yang, Jie; Pilgrim, Rory; Kazemzadeh, Sahar; Yu, Jin; Kalidindi, Sreenivasa Raju; Etemadi, Mozziyar; Garcia-Vicente, Florencia; Melnick, David; Corrado, Greg S; Peng, Lily; Eswaran, Krish; Tse, Daniel; Beladia, Neeral; Liu, Yun; Chen, Po-Hsuan Cameron; Shetty, Shravya.

Sci Rep ; 11(1): 15523, 2021 09 01.

Article in English | MEDLINE | ID: mdl-34471144

ABSTRACT

Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to detect every possible condition by building multiple separate systems, each of which detects one or more pre-specified conditions. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For training and tuning the system, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system trained using a large dataset containing a diverse array of CXR abnormalities generalizes to new patient populations and unseen diseases. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7-28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist. Lastly, to facilitate the continued development of AI models for CXR, we release our collected labels for the publicly available dataset.

Subject(s)

COVID-19/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Tuberculosis/diagnostic imaging , Adult , Aged , Algorithms , Case-Control Studies , China , Deep Learning , Female , Humans , India , Male , Middle Aged , Radiography, Thoracic , United States

Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation.

Majkowska, Anna; Mittal, Sid; Steiner, David F; Reicher, Joshua J; McKinney, Scott Mayer; Duggan, Gavin E; Eswaran, Krish; Cameron Chen, Po-Hsuan; Liu, Yun; Kalidindi, Sreenivasa Raju; Ding, Alexander; Corrado, Greg S; Tse, Daniel; Shetty, Shravya.

Radiology ; 294(2): 421-431, 2020 02.

Article in English | MEDLINE | ID: mdl-31793848

ABSTRACT

BackgroundDeep learning has the potential to augment the use of chest radiography in clinical radiology, but challenges include poor generalizability, spectrum bias, and difficulty comparing across studies.PurposeTo develop and evaluate deep learning models for chest radiograph interpretation by using radiologist-adjudicated reference standards.Materials and MethodsDeep learning models were developed to detect four findings (pneumothorax, opacity, nodule or mass, and fracture) on frontal chest radiographs. This retrospective study used two data sets. Data set 1 (DS1) consisted of 759 611 images from a multicity hospital network and ChestX-ray14 is a publicly available data set with 112 120 images. Natural language processing and expert review of a subset of images provided labels for 657 954 training images. Test sets consisted of 1818 and 1962 images from DS1 and ChestX-ray14, respectively. Reference standards were defined by radiologist-adjudicated image review. Performance was evaluated by area under the receiver operating characteristic curve analysis, sensitivity, specificity, and positive predictive value. Four radiologists reviewed test set images for performance comparison. Inverse probability weighting was applied to DS1 to account for positive radiograph enrichment and estimate population-level performance.ResultsIn DS1, population-adjusted areas under the receiver operating characteristic curve for pneumothorax, nodule or mass, airspace opacity, and fracture were, respectively, 0.95 (95% confidence interval [CI]: 0.91, 0.99), 0.72 (95% CI: 0.66, 0.77), 0.91 (95% CI: 0.88, 0.93), and 0.86 (95% CI: 0.79, 0.92). With ChestX-ray14, areas under the receiver operating characteristic curve were 0.94 (95% CI: 0.93, 0.96), 0.91 (95% CI: 0.89, 0.93), 0.94 (95% CI: 0.93, 0.95), and 0.81 (95% CI: 0.75, 0.86), respectively.ConclusionExpert-level models for detecting clinically relevant chest radiograph findings were developed for this study by using adjudicated reference standards and with population-level performance estimation. Radiologist-adjudicated labels for 2412 ChestX-ray14 validation set images and 1962 test set images are provided.© RSNA, 2019Online supplemental material is available for this article.See also the editorial by Chang in this issue.

Subject(s)

Radiographic Image Interpretation, Computer-Assisted/methods , Radiography, Thoracic/methods , Respiratory Tract Diseases/diagnostic imaging , Thoracic Injuries/diagnostic imaging , Adolescent , Adult , Aged , Aged, 80 and over , Child , Child, Preschool , Deep Learning , Female , Humans , Infant , Male , Middle Aged , Pneumothorax , Radiologists , Reference Standards , Reproducibility of Results , Retrospective Studies , Sensitivity and Specificity , Young Adult

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL